Reclink: aplicativo para o relacionamento de bases de dados, implementando o método probabilistic record linkage Reclink: an application for database linkage implementing the probabilistic record linkage method
نویسندگان
چکیده
This paper presents a system for database linkage based on the probabilistic record linkage technique, developed in the C++ language with the Borland C++ Builder version 3.0 programming environment. The system was tested in the linkage of data sources of different sizes, evaluated both in terms of processing time and sensitivity for identifying true record pairs. Significantly less time was spent in record processing when the program was used, as compared to manual processing, especially in situations where larger databases were used. Manual and automatic processes had equivalent sensitivities in situations where we used databases with fewer records. However, as the number of records grew we noticed a clear reduction in the sensitivity of the manual process, but not in the automatic one. Although in its initial stage of development, the system performed well in terms of both processing speed and sensitivity. Although overall performance of algorithms was satisfactory, we intend to evaluate other routines in the attempt to improve the system’s performance.
منابع مشابه
Sensitivity of probabilistic record linkage for reported birth identification: Pró-Saúde Study.
The objective of the study was to evaluate the sensitivity of probabilistic record linkage for reported birth identification. Data from the Pró-Saúde Study cohort population were used comprising technical-administrative staff at a university in Rio de Janeiro, Brazil, in 1999. A total of 92 records of subjects were linked to the database of the Brazilian Information System on Live Births (SINAS...
متن کاملAccuracy of probabilistic and deterministic record linkage: the case of tuberculosis
OBJECTIVE To analyze the accuracy of deterministic and probabilistic record linkage to identify TB duplicate records, as well as the characteristics of discordant pairs. METHODS The study analyzed all TB records from 2009 to 2011 in the state of Rio de Janeiro. A deterministic record linkage algorithm was developed using a set of 70 rules, based on the combination of fragments of the key vari...
متن کاملProbabilistic Linkage of Persian Record with Missing Data
Extended Abstract. When the comprehensive information about a topic is scattered among two or more data sets, using only one of those data sets would lead to information loss available in other data sets. Hence, it is necessary to integrate scattered information to a comprehensive unique data set. On the other hand, sometimes we are interested in recognition of duplications in a data set. The i...
متن کاملUma Avaliação de Eficiência e Eficácia da Combinação de Técnicas para Deduplicação de Dados
Data Deduplication is the task of identifying and eliminating duplicate records in a single database. It is a complex process that involves several steps, including: defining blocking key, similarity function and indexing method. There are several approaches for each of these steps. In this context, the objective of this work is to find the best combination for such algorithms aiming to improve...
متن کاملTuberculosis and diabetes: probabilistic linkage of databases to study the association between both diseases.
OBJECTIVE: to describe the profile of cases of tuberculosis and diabetes comorbidity in Brazil. METHODS: this is a descriptive study with data from the Brazilian Information System for Notifiable Diseases - tuberculosis (Sinan-TB) and from the System of Registration and Monitoring of Hypertension and Diabetes Mellitus (Hiperdia), from 2007 to 2011; probabilistic linkage was carried out with R...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000